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The quantum circuit model allows gates between any pair of qubits yet physical instantiations 
allow only limited interactions. We address this problem by providing an interaction graph together 
with an efficient method for compiling quantum circuits so that gates are applied only locally. The 
graph requires each qubit to interact with 4 other qubits and yet the time-overhead for implementing 
any n-qubit quantum circuit is 6 log n. Building a network of quantum computing nodes according 
to this graph enables the network to emulate a single monolithic device with minimal overhead. 


I. INTRODUCTION 

Just as with their classical counterparts, quantum al¬ 
gorithms will be compiled into a sequence of elementary 
physical operations. Quantum algorithms use arbitrary 
two-qubit interactions since in the circuit model, gates 
can be applied to any pair of qubits. However, after 
quantum error correction the allowed logical interactions 
are limited to a graph that typically has low degree. Beals 
et al. give a sequence of SWAP gates permuting the 
qubits so that every interaction occurs between neigh¬ 
bours of the host graph [l|. The time overhead, T, de¬ 
pends on the properties of the graph. Two interesting 
examples being the Ic-dimensional lattice which for an n 
qubit device has overhead T = and the hyper¬ 

cube with overhead T = 0{\og^ n) 0. Comparing to the 
solution where each gate is implemented by a separate 
permutation, this means that the time to permute all n 
qubits is within a logarithmic factor of the time to move 
just one. 

The hypercube is a powerful network with the ability 
to sort in time O(log^n). However, the degree of each 
node grows as log n, which for large n could become diffi¬ 
cult to implement and means that new components have 
to be designed as the device is scaled up. In addition, 
implementations of optical switches in a noisy network 
model typically suffer losses and so it is appealing to re¬ 
duce the degree to a small constant. In this paper we 
present improvements to the approach taken by Beals et 
al. in two directions. We reduce the required degree of 
the network to a small constant and at the same time 
cut the overhead to 6 log n (see Table [J for a comparison 
to previous work). This lowers the cost of implementing 
arbitrary quantum algorithms on a physical device and 
makes the required networks more realistic. A device 
built using this architecture is truly scalable, additional 
nodes have the same small degree as the existing qubits. 
In addition, the lower degree means that we have reduced 
the total number of connections by a factor O(logn). 

In sectionini we introduce hypercube-like networks and 
in particular, the so called cyclic butterfly network. We 
then discuss the properties of a cyclic butterfly graph 
that we need for the main result which is presented in 
Section m Some alternative networks and the applica¬ 
tion of these ideas to near-term experiments on noisy 
network architectures are discussed in the conclusion. 
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TABLE I. The time, T, and space, S, overhead of embedding 
a quantum circuit into the graph restricted by the physical im¬ 
plementation. A key limitation being the degree of the graph 
which corresponds to number of interactions per qubit. Previ¬ 
ous results have applied to the ID and 2D nearest-neighbour 
(n.-n.) and hypercube graph. The final line summarizes the 
main result of this paper. We show that using a cyclic but¬ 
terfly network reduces both the degree and time overhead in 
emulating a quantum circuit on a physically realistic device. 


II. HYPERCUBIC NETWORKS 

We represent a network of qubits as an undirected 
graph. Nodes correspond to single qubits, or qubit plus 
a single ancilla, and edges correspond to the allowed in¬ 
teractions. The problem of permuting qubits is then sim¬ 
ilar to routing packets of information in a synchronous 
parallel computer. SWAP gates exchange quantum in¬ 
formation between two nodes or move a quantum state 
into a node provided there is an available ancilla qubit in 
the state |0). In comparison to parallel classical comput¬ 
ing, the parameters we are interested in are somewhat 
different. For example, we will think of each node as 
a single (or pair of) qubit(s) rather than a computing 
node capable of complex operations. We clearly distin¬ 
guish between the off-line classical computation which is 
essentially free (provided it is poly-time) from the on-line 
quantum computing. We also impose the restriction that 
no two ’packets’ can be stored at a single node; there is 
no ’buffering’ space in a single qubit. 

The quantum computer is required to work syn¬ 
chronously at the logically level - of course at the physical 
scale, entanglement generation or magic state distillation 
will be probabilistic and gate times will vary. We do not 







2 


address these issues here but rather assume that suffi¬ 
cient physical resources allow the system to effectively 
function as a synchronous device. 

The computational power of a network is typically de¬ 
scribed in terms of its ability to emulate the complete 
graph. Hypercubic networks are variants of the hyper¬ 
cube that are designed to use nodes with constant degree 
yet maintain its computational power to within a small 
constant. Since we consider each node as a qubit, the 
low degree means that we do not require too many possi¬ 
ble interactions with other qubits. In addition, hypercu¬ 
bic networks typically have a nice scaling property since 
we can use the same components in any size quantum 
computer (although the distance of the interactions may 
grow). There are many hypercubic networks with promi¬ 
nent examples being the butterfly, cube-connected cycles, 
Benes network, shuffle-exchange and the de Bruign net¬ 
work (see for example, ref. @). We will use the so called 
cyclic butterfly network (defined below) which has two 
useful properties; it embeds a Benes network and is in¬ 
variant under cyclic permutations. 


A. The cyclic butterfly network 

The n = r2'~ nodes of an r-dimensional cyclic but¬ 
terfly network (also called a wrapped butterfly) can be 
described in terms of the rows and columns of an r x 2'" 
array. Each node is labelled by a pair (w, i) where w is 
a r-bit word corresponding to one of the 2’’ rows and i 
labels the column. Two nodes (w, i) and mod r) 

are connected by an edge if either they are in the same 
row, w = V ov ii w and v differ by precisely one bit in 
position i. There are no other connections in the network 
so the degree of every node equals 4. An example of a 
n = 3 X 2^ node cyclic butterfly network is given in Fig. 
1 . 

The cyclic butterfly network is closely related to the 
hypercube. Merging the r nodes in every row into a single 
node results in the 2*' node hypercube. Like the hyper¬ 
cube, the butterfly network has a simple recursive struc¬ 
ture, one r-dimensional butterfly contains two (r — 1)- 
dimensional butterflies. 

There are two properties of cyclic butterfly networks 
that we make use of in our efficient algorithm for moving 
qubits. The first property is that the graph embeds a so 
called Benes network 0 , meaning that if we traverse the 
graph with column label increasing from z = 0 —^ r = 0 
and then back, z = r —^ 0, we can implement any permu¬ 
tation of the zc = 0 ... 2’’ — 1 row labels without collisions. 
The second property is that the graph is cyclic: reorder¬ 
ing the rows i ^ i + \ mod r results in the same cyclic 
butterfly graph. Combining these two properties means 
that every column can traverse a Benes network simulta¬ 
neously. Thus on a cyclic butterfly, we can permute the 
2'" row elements in every column without collisions. Note 
that this is trivially true on a square x ^/n lattice: 
we can simultaneously permute the entries of every 


column independently. The crucial difference is that on 
a cyclic butterfly the time taken is only 2r k, 2 log n as 
opposed to -Jn on a square lattice. 

III. ALGORITHM FOR PERMUTING QUBITS 

We now present the main result of the paper, that the 
butterfly network can implement any quantum algorithm 
with an overhead of 6 logn. 

Theorem 1 On a n-qubit cyclic butterfly network, there 
is a sequence of local gates with depth 6 log n such that the 
qubit at node a is sent to node 7r(a) for all a = 1,... ,n 
and any permutation tt : [ 1 , zz] —>• [ 1 , zz]. 

Proof. We use the row and column structure of the 
graph. The destinations of every qubit are label-ed by 
2 ’’ rows, w, and r columns indexed by z = 0 ,..., r — 
1. We implement a permutation of all nodes in three 
steps using this structure: we first permute the rows, then 
columns and finally the rows again. The only moves we 
are allowed to make is swapping two qubits or moving 
a qubit from one node into its neighbours ancilla. In 
particular, no two qubits can occupy the same node in a 
single step. 

We first permute the entries in each row in such a way 
that the row destination of every qubit in each column 
become distinct i.e. after permuting rows, column z, con¬ 
tains every word zc = 0 ,..., 2 ’’ — 1 for all z = 0 ,..., r — 1 . 
This is made possible by Hall’s Matching Theorem m 
- also called Hall’s marriage theorem as it allows two 
groups of men and women to happily marry. A matching 
in a graph is a set of edges that have no common vertices. 
Hall’s theorem gives a necessary and sufficient condition 
for finding a matching and is commonly used in routing 
problems. 

We use the permutation tt to construct a bipartite 
“routing graph” {U, V, E) containing 22” nodes U = 
{ui,... ,U 2 r} and V = {ui,... , 02 ^} and r2” edges U = 
{ei,..., 6 ^ 2 '-}. The U nodes represent the original row 
location of each qubit and the V nodes are their desti¬ 
nation rows. If a qubit in row Ui has a destination row 
Vj we add the edge {ui,Vj) so that there are r edges for 
every node in U and V. 

Hall’s Matching Theorem then tells us that we can 
r—colour the edges so that no colour is used twice at any 
node. We can use the Ford-Fulkerson algorithm to find 
the matching by reducing the problem to a maximum- 
flow problem [l^ . We add two nodes s and t to the graph 
and connect s to everything in C7 and t to everything 
in V. Since each node has unit capacity, a matching 
is equivalent to the maximum flow from s to t. The 
classical computation of the Ford-Fulkerson algorithm is 
bounded by 0(|t/||i?|) = 0{n‘^) [l^. Having coloured the 
edges, we now know how to permute the row elements; 
an operation we can implement in time 2r — 3 using an 
insertion sorting network since each row is a ID nearest 
neighbour graph (see Appendix). 
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The r—colouring implies that in every column, i, each 
row label appears exactly once. Using the Benes and 
pipe-lining properties of the butterfly network discussed 
in Sec III A[ we can sort every column according to the 
row labels in 2r steps. In the first r steps, the qubits 
increment i i + 1 mod r, then in the final r step the 
rows move in the opposite direction i i — 1 mod r. 
Using a single ancilla at each node the time cost is 2r. 

The final part of the algorithm is to permute the rows 
according to the column labels. Since the destination 
column labels are now all distinct, this is possible without 
collisions using insertion sort. 

The total time overhead is thus T = (2r — 3) -I- (2r) -|- 
(2r — 3) < 61ogn as claimed. ■ 

Corollary 2 A quantum computer whose n logical qubits 
are connected according to the cyclic butterfly network 
can implement any quantum algorithm with a time and 
space overhead of T = 6 log n and S = 2 respectively. 

Proof. Each time-step in a quantum circuit consists of 
up to n/2 two-qubit gates. The gates define the permu¬ 
tation, TT, used in Theorem 1. We place the destination 
of each pair of qubits involved in a gate so that they are 
neighbours in the cyclic-butterfly graph. The proof of 
Theorem 1 provides an efficient method to construct a 
sequence of gates implementing the permutation. Every 
time step requires one permutation of the qubits so the 
time and space overhead is precisely that given in Theo¬ 
rem 1. ■ 


IV. CONCLUSION 

Quantum computers are fully parallel machines. Ev¬ 
ery qubit is effectively a processing node since the iden¬ 
tity gate will be error corrected at a cost similar to other 
gates. Taking this view has led to the application of 
techniques developed for routing in synchronous parallel 
(classical) computers. We presented an efficient method 
for compiling a quantum circuit onto a cyclic-butterfly 
network. This improves on previous results in two re¬ 
spects. The interaction graph has constant degree and 
at the same time, the time overhead is a small constant 
away from the best possible (the time to move a single 
qubit). 

There are two alterations to the cyclic butterfly graph 
one could make that achieve a trade-off between the cost 
of building the network and the time-overhead in emu¬ 
lating arbitrary circuits. 

1. Replace each node by a ring of 4 nodes, each con¬ 
nected to one of the previous edges. This reduces 
the connectivity to 3, the minimum possible non¬ 
trivial degree, whilst increasing the time overhead 
by a factor 2. 

2. Use the fc-arry cyclic butterfly graph. In this case, 
the degree increase to 2k whilst reducing the over¬ 
head to T = 6 logfc n. 


Combining these two ideas results in a slightly more effi¬ 
cient solution than the cyclic butterfly graph. The fc-arry 
cyclic butterfly with each node expanded to a ring of 2k 
nodes has degree 3 and time overhead T = 6fc log,i. n, thus 
taking fc = 3 is optimal. 

The ideas presented here can used when designing the 
communication architecture in a noisy network quantum 
computer. Individual nodes (or cells) correspond to a 
small number of physical qubits in a system such as NV 
centers in diamond, trapped ions or superconducting de¬ 
vices. Photonic channels mediate entanglement between 
two nodes which can then be distilled to allow inter-node 
communication (see, for example, recent experimental re¬ 
sults in NV centers [ll| , superconducting qubits [l2| and 
trapped ions [13 )■ Nickerson et al. show how these re¬ 
sources could be used to implement a fault tolerant com¬ 
putation via the surface code even in the presence of noisy 
photonic links [I^l . An alternative approach would be to 
take advantage of the cyclic butterfly graph and use CSS 
block codes. Steane described how fault tolerant oper¬ 
ations can be performed on separate CSS block codes 
via ancilla states EH. Thus nodes could correspond 
to a small number of logical qubits, each in a separate 
block. The ancilla states would then be distilled using 
the photonic channel in much the same way as 4-qubit 
GHZ states are required when using the surface code. 
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APPENDIX: SORTING NETWORKS 

A sorting network is designed to sort all possible input 
sequences using only comparison gates acting on neigh¬ 
bouring nodes {x, y) € G, 

[[y,x) itx<y. 

That is, C{x,y) swaps the inputs ii x < y and leaves them 
unchanged otherwise. Sorting networks have been well 
studied in the classical literature and examples are know 
over various graphs Q . Two examples are insertion sort 
and bitonic sort that sort over the ID nearest-neighbour 
and hypercubic graphs respectively (see Fig 121 ). With 
full parallelism, bubble sort and insertion sort lead to the 
same ID nearest neighbour algorithm and require time 
T = 2n-3. 

A sorting network over a graph, G, provides a method 
of compiling any circuit onto G. Each time-step in the 
original circuit defines a permutation; qubits are moved 
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so that the gates become local in G. The classical com¬ 
piler then inputs the destinations into the sorting net¬ 
work and each time the comparison gate implements a 
SWAP, the compiler applies a SWAP gate to the cor¬ 
responding qubits. By construction, every operation is 
local in G and once the required gates from the sorting 


network have been added, the gates from the time-step 
in the original circuit can be enacted on neighbouring 
qubits. Note that it is not necessary to have a sorting 
network that correctly sorts all inputs, we only need to 
sort the inputs that appear in the circuit. In addition, 
one could use a different network for each time-step. 
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FIG. 1. A 3-dimensional cyclic butterfly graph with n = 
3x2® nodes representing a qubit plus its ancilla. The edges 
represent the allowed interactions between qubits. 
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FIG. 2. Two examples of sorting networks on 8 inputs: (a) 
the insertion sort over a ID nearest neighbour graph which 
sorts in time T = 2n — 3, and (b) the bitonic sort over the 
hypercube that requires time T — ^ logn(logn + 1). 



